Overview

In this document we describe with simple example and model how neural networks work, in order to understand the technology that is enabling Artificial Intelligence implementation.

What we will show

We will build a Neural Network capable of classifying the Iris flower according to the three species Setosa, Versicolor and Virginica. We selected Iris species as it is commonly used to test classification algorithm.

As we want to focus on Sinmple Neural Network to keep it easy and understandable, we will base our classification on length and width of Petal and Sepal, according to the Iris Data set - see attribute information box -. It is in fact known that one can classify the Iris type based on those “features”. In contrast we will not develop an image recognition Convolutional Neural Network as we want to make the Neural Net simple and understandable.

FIGURE 1 - IRIS

“FIGURE 1 - IRIS”

So our input is the Iris Data set summarized here.

summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species  
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   setosa    :50  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   versicolor:50  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300   virginica :50  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199                  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500                  

How we will do it

We will use a standard LEARN and TEST approach, like at school: you have exercises with solutions you go through to learn and verify you are learning, and then you have the test phase where you have the exercise and the teacher will verify your result. This approach is used with Neural Netowrk as well and it is called Supervised Learning. The learning phase is also called training phase.

For the training phase we will use a portion of the Iris Data set and the remaining part of the data set will be used for the testing phase. Let’s consider 66% of the data for training and 33% of the data for testing. As the entire data set consists of 150 rows, we will use 100 rows as exercises to train like in homework and 50 rows as exercises to test like in class tests.

So our “student” that is the Neural Network - remember that it doesn’t know anything about Iris Species - will be trained with 100 exercises, During this phase it will learn, just by “looking” at the example, you don’t need to explain anything. After this phase we will test it with the reamaining 50 exercises, but this time we will provide to the Neural Network only the Petal, Sepal, length, width and will ask to classify the Iris. We will then check the answers against the solution that we have.

# set.seed(10)
# iris[sample(1:150,10),]

During this phase it will learn, just by “looking” at the example, you don’t need to explain anything. After this phase we will test it with the reamaining 50 exercises, but this time we will provide to the Neural Network only the Petal, Sepal, length, width and will ask to classify the Iris. We will then check the answers against the solution that we have. So the test will be performed on 50 exercises. See the following picture:

FIGURE 2 - TRAINING AND TESTING PHASES

“FIGURE 2 - TRAINING AND TESTING PHASES”

# set.seed(150)
# Species = rep("?",10)
# cbind(iris[sample(1:150,10),1:4],Species)

So after the training we will ask the Neural Network to replace the question mark with the proper answer: Setosa, Versicolor, Virginica.

Doesn’t sound a bit magic?

That’s the magic of learning by Neural Network. Think about the magic of a child learning a language just by listening, and sometimes getting the parents correcting him. Neural Network works in a similar way, as they try to mimic how the brain works.

But now let’s see what’s a Neural Network.

What a Neural Network is

A Neural Network consists of neural units connected among them and connected with the world through input and output. So basically Neural Network has one Input layer, one outpur layer and several hidden layers. In the case all neural units are connected to all neural units of the following layer. Such kind of Neural Network is a Feed Forward Neural Network, in contrast to convolutional and recursive Neural Network that have a different architecture. Here we will focus on Feed Forward Neural Network.Each neural unit receives several numerical input, weight each input and sum the input, before sending to the output the numerical value is processed by a non linear function that is called activation function as showed in the following picture:

Neural Network Architecture

“Neural Network Architecture”

So the Neural Network is basically defined by:

Now let’s classify the Iris!

As first step we have to create the Neural Network capable of classifing the Iris species. We will be using a standard method based on supervised learning, exactly using 66% of the Iris Data Set above, the 33% Data Set will be use to Validate the Neural Network.

So basicall the Neural Network will learn from examples and than will be evaluated with a classification test.

THe study is based on the R package nnet, that is a simple package that support only one hidden layer but it is powerful enough to train good models and for tutorial reasons.

Let’s split the Iris data in trainig set to learn and testing set to test, according to 66% - 33% ratio.

set.seed(180)
i <- sample(1:150,100)
iris_train <- iris[i,]
iris_test <- iris[-i,]
summary(iris_train)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width         Species  
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.10   setosa    :36  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.500   1st Qu.:0.20   versicolor:37  
 Median :5.700   Median :3.000   Median :4.100   Median :1.30   virginica :27  
 Mean   :5.811   Mean   :3.084   Mean   :3.607   Mean   :1.11                  
 3rd Qu.:6.325   3rd Qu.:3.400   3rd Qu.:5.025   3rd Qu.:1.80                  
 Max.   :7.900   Max.   :4.400   Max.   :6.700   Max.   :2.50                  

Now we need 4 neural nodes in input fed by the Sepal and Petal length and width, and 3 neural nodes in output to specify the probability of each species. Let’s consider only one neural node in the hidden layer (only one hidden layer is considered here for simplicity), the activation function we consider is the sigmoid and then the resulting Neural Network looks as follow:

library(devtools)
source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
SHA-1 hash of file is 74c80bd5ddbc17ab3ae5ece9c0ed9beb612e87ef
library(nnet)
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1, maxit = 0)
# weights:  11
plot.nnet(irnn)
Loading required package: scales
Loading required package: reshape

So it basically model creation at this point consists of “tuning” the 11 weights and the bias value. This is obatain in an iterative way, starting with random numbers and then changing them with a method called backpropagation in order to minimize the output error, so to minimize the misclassification. The iris_train data is used several time to obtain the convergence. Let’s have a look at how such network works before “learning”, so before the weights are tuned. It means basically that we are using random numbers.

irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
            irpred
             setosa
  setosa         36
  versicolor     37
  virginica      27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 0.36

Unsurprisingly the accuracy is 36%, that is basically random choice of the Iris type. Now lets apply one cycle learning and see if it improves, so if it learns.

library(nnet)
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1, maxit = 1)
# weights:  11
initial  value 111.716872 
final  value 111.000802 
stopped after 2 iterations
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
            irpred
             setosa
  setosa         36
  versicolor     37
  virginica      27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 0.36

Well it seems that the first learning cycle didn’t improve, so let the Neural Network play and play again with the data set as Romans said “Repetita Iuvant”: repeating things help! Let’s apply the default nnet package 100 repetition.

set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1)
# weights:  11
initial  value 111.716872 
iter  10 value 48.716977
iter  20 value 43.578253
final  value 43.577037 
converged
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
            irpred
             setosa versicolor
  setosa         36          0
  versicolor      0         37
  virginica       0         27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 0.73

Good the Neural Network improved a lot! Now it classifies the Iris species with 73% accuracy! At least in the iris_train. But now let’s see if it can understand Species from a new data set, never seen before, not used for traning, the iris_test.

irpred <- predict(irnn, iris_test, type = "class")
table(iris_test$Species, irpred)
            irpred
             setosa versicolor
  setosa         14          0
  versicolor      0         13
  virginica       0         23
sum(iris_test$Species == irpred) / length(iris_test$Species)
[1] 0.54

Good! We are 54% with only one neural node. Let’s have a look at it, note how the weights changed as you can see with the size of the links between neural nodes.

plot.nnet(irnn)

Now let’s try with a more complex Neural Network, in this case we increase the number of Neural Nodes in the hidden layer to 3

set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 3)
# weights:  27
initial  value 124.056864 
iter  10 value 27.920209
iter  20 value 0.210898
iter  30 value 0.002851
final  value 0.000071 
converged
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
            irpred
             setosa versicolor virginica
  setosa         36          0         0
  versicolor      0         37         0
  virginica       0          0        27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 1

Very good improvement! We are now at 100% in the training set. This is not surprisingly we imporved, in fact the second models has higher number of neural nodes and connections among them. So basically we evolved the species! Let’s test it against testing set.

irpred <- predict(irnn, iris_test, type = "class")
table(iris_test$Species, irpred)
            irpred
             setosa versicolor virginica
  setosa         14          0         0
  versicolor      0         11         2
  virginica       0          0        23
sum(iris_test$Species == irpred) / length(iris_test$Species)
[1] 0.96

Doing very well! 96% Accuracy. Let’s have a look at the Neural Net:

plot.nnet(irnn)

Conclusion

We have trained a simple Neural Network based on available data that teaches how to classify the Iris species based on sepal-petal length and width. The training consisted of tuning weigths in the Neural Network by going through the trainig set several time. This process is pretty similar to a human learning process, where the man is trained with a controlled environemnt and he can check his results with actual results.

After the training is completed, meaning that the man is making a limited number of errors, he is ready to face real cases, that is cases where he doesn’t know the answer and he has to apply what he learnt.

This is exaclty what we did here with the simple Neural Net exercize. Now the Neural Net can classify Iris. Note also the effect of more neural nodes in the hidden layer that improved the accuracy from 54% to 96%. Note also that an untrained netowork is providing random guess.

---
title: 'Artificial Intelligence: a Neural Network Tutorial'
author: "Luca Vignali"
output:
  html_notebook: default
  html_document: default
---

# Overview
In this document we describe with simple example and model how neural networks work, in order to understand the technology that is enabling Artificial Intelligence implementation.

# What we will show
We will build a Neural Network capable of classifying the Iris flower according to the three species Setosa, Versicolor and Virginica. 
We selected Iris species as it is commonly used to test classification algorithm.

As we want to focus on Sinmple Neural Network to keep it easy and understandable, we will base our classification on length and width of Petal and Sepal, according to the Iris Data set - see attribute information box -. It is in fact known that one can classify the Iris type based on those "features".
In contrast we will not develop an image recognition Convolutional Neural Network as we want to make the Neural Net simple and understandable.

!["FIGURE 1 - IRIS"](./IRIS.png) 

 


So our input is the Iris Data set summarized here.

```{r}
summary(iris)
```

# How we will do it
We will use a standard LEARN and TEST approach, like at school: you have exercises with solutions you go through to learn and verify you are learning, and then you have the test phase where you have the exercise and the teacher will verify your result. This approach is used with Neural Netowrk as well and it is called Supervised Learning. The learning phase is also called training phase.

For the training phase we will use a portion of the Iris Data set and the remaining part of the data set will be used for the testing phase. Let's consider 66% of the data for training and 33% of the data for testing. As the entire data set consists of 150 rows, we will use 100 rows as exercises to train like in homework and 50 rows as exercises to test like in class tests.  

So our "student" that is the Neural Network - remember that it doesn't know anything about Iris Species - will be trained with 100 exercises, During this phase it will learn, just by "looking" at the example, you don't need to explain anything. After this phase we will test it with the reamaining 50 exercises, but this time we will provide to the Neural Network only the Petal, Sepal, length, width and will ask to classify the Iris. We will then check the answers against the solution that we have.

```{r}
# set.seed(10)
# iris[sample(1:150,10),]
```


During this phase it will learn, just by "looking" at the example, you don't need to explain anything. After this phase we will test it with the reamaining 50 exercises, but this time we will provide to the Neural Network only the Petal, Sepal, length, width and will ask to classify the Iris. We will then check the answers against the solution that we have.
So the test will be performed on 50 exercises. See the following picture:

!["FIGURE 2 - TRAINING AND TESTING PHASES"](./Training-Testing.png) 

```{r}
# set.seed(150)
# Species = rep("?",10)
# cbind(iris[sample(1:150,10),1:4],Species)
```


So after the training we will ask the Neural Network to replace the question mark with the proper answer: Setosa, Versicolor, Virginica.

Doesn't sound a bit magic? 

That's the magic of learning by Neural Network. Think about the magic of a child learning a language just by listening, and sometimes getting the parents correcting him.
Neural Network works in a similar way, as they try to mimic how the brain works.

But now let's see what's a Neural Network.


# What a Neural Network is
A Neural Network consists of neural units connected among them and connected with the world through input and output. So basically Neural Network has one Input layer, one outpur layer and several hidden layers. In the case all neural units are connected to all neural units of the following layer. Such kind of Neural Network is a Feed Forward Neural Network, in contrast to convolutional and recursive Neural Network that have a different architecture. Here we will focus on Feed Forward Neural Network.Each neural unit receives several numerical input, weight each input and sum the input, before sending to the output the numerical value is processed by a non linear function that is called activation function as showed in the following picture:

!["Neural Network Architecture"](./NeuralNetwork.PNG)


So the Neural Network is basically defined by:

* Number of Neurons in each layer.
* How many hidden layers are present.
* The weigth of each interconnection.
* The activation function at each neuron.
* The bias input to each layer - excluding input layer


# Now let's classify the Iris!

As first step we have to create the Neural Network capable of classifing the Iris species. We will be using a standard method based on supervised learning, exactly using 66% of the Iris Data Set above, the 33% Data Set will be use to Validate the Neural Network.

So basicall the Neural Network will learn from examples and than will be evaluated with a classification test.

THe study is based on the R package nnet, that is a simple package that support only one hidden layer but it is powerful enough to train good models and for tutorial reasons.

Let's split the Iris data in trainig set to learn and testing set to test, according to 66% - 33% ratio.

```{r}
set.seed(180)
i <- sample(1:150,100)
iris_train <- iris[i,]
iris_test <- iris[-i,]
summary(iris_train)
```

Now we need 4 neural nodes in input fed by the Sepal and Petal length and width, and 3 neural nodes in output to specify the probability of each species.
Let's consider only one neural node in the hidden layer (only one hidden layer is considered here for simplicity), the activation function we consider is the sigmoid and then the resulting Neural Network looks as follow:

```{r}
library(devtools)
source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
library(nnet)
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1, maxit = 0)
plot.nnet(irnn)

```

So it basically model creation at this point consists of "tuning" the 11 weights and the bias value. This is obatain in an iterative way, starting with random numbers and then changing them with a method called backpropagation in order to minimize the output error, so to minimize the misclassification. The iris_train data is used several time to obtain the convergence.
Let's have a look at how such network works before "learning", so before the weights are tuned. It means basically that we are using random numbers.

```{r}
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
sum(iris_train$Species == irpred) / length(iris_train$Species)

```

Unsurprisingly the accuracy is 36%, that is basically random choice of the Iris type. Now lets apply one cycle learning and see if it improves, so if it learns.

```{r}
library(nnet)
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1, maxit = 1)
irpred <- predict(irnn, iris_train, type = "class")

table(iris_train$Species, irpred)
sum(iris_train$Species == irpred) / length(iris_train$Species)
```

Well it seems that the first learning cycle didn't improve, so let the Neural Network play and play again with the data set as Romans said "Repetita Iuvant": repeating things help! Let's apply the default nnet package 100 repetition.

```{r}
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1)
irpred <- predict(irnn, iris_train, type = "class")


table(iris_train$Species, irpred)
sum(iris_train$Species == irpred) / length(iris_train$Species)

```

Good the Neural Network improved a lot! Now it classifies the Iris species with 73% accuracy! At least in the iris_train. But now let's see if it can understand Species from a new data set, never seen before, not used for traning, the iris_test.

```{r}
irpred <- predict(irnn, iris_test, type = "class")
table(iris_test$Species, irpred)
sum(iris_test$Species == irpred) / length(iris_test$Species)
```

Good! We are 54% with only one neural node. Let's have a look at it, note how the weights changed as you can see with the size of the links between neural nodes.

```{r}
plot.nnet(irnn)
```

Now let's try with a more complex Neural Network, in this case we increase the number of Neural Nodes in the hidden layer to 3

```{r}
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 3)
irpred <- predict(irnn, iris_train, type = "class")


table(iris_train$Species, irpred)
sum(iris_train$Species == irpred) / length(iris_train$Species)
```

Very good improvement! We are now at 100% in the training set. This is not surprisingly we imporved, in fact the second models has higher number of neural nodes and connections among them. So basically we evolved the species!
Let's test it against testing set.

```{r}
irpred <- predict(irnn, iris_test, type = "class")
table(iris_test$Species, irpred)
sum(iris_test$Species == irpred) / length(iris_test$Species)

```

Doing very well! 96% Accuracy. Let's have a look at the Neural Net:

```{r}
plot.nnet(irnn)
```


# Conclusion
We have trained a simple Neural Network based on available data that teaches how to classify the Iris species based on sepal-petal length and width. The training consisted of tuning weigths in the Neural Network by going through the trainig set several time. This process is pretty similar to a human learning process, where the man is trained with a controlled environemnt and he can check his results with actual results. 

After the training is completed, meaning that the man is making a limited number of errors, he is ready to face real cases, that is cases where he doesn't know the answer and he has to apply what he learnt.

This is exaclty what we did here with the simple Neural Net exercize. Now the Neural Net can classify Iris. Note also the effect of more neural nodes in the hidden layer that improved the accuracy from 54% to 96%. Note also that an untrained netowork is providing random guess.









